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PURPOSE: To provide a table recognizing device 
capable of accurately segmenting respective frames 
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means 1 1 and outputting data expressing the structure 
of the table. Since the structure of the table is recognized 
by using the arrangement of character blocks, the 
structure of the table can be accurately recognized even 
when part or the whole of vertical and horizontal ruled 
lines is omitted. 
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* NOTICES * 

JPO cind NCIPI are not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2.**** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1] Table recognition equipment characterized by having a physical relationship discernment 
means to generate the data which identify the physical relationship between alphabetic blocks extracted 
by alphabetic block extract means to extract an alphabetic block from a table image, and said alphabetic 
block extract means, and express the structure of a table. 

[Claim 2] Table recognition equipment according to claim 1 characterized by inputting the alphabetic 
character image which established the alphabetic character and a ruled line separation means to divide a 
table image into an alphabetic character image and a ruled line image, and was divided into said 
alphabetic block extract means by said alphabetic character and ruled line separation means. 
[Claim 3] Said alphabetic block extract means is table recognition equipment according to claim 1 
characterized by having an alphabetic character rectangle extract means to ask for the rectangle field 
surrounding the lump of a pixel with which the alphabetic character is written, and an alphabetic block 
rectangle extract means to unify one or more alphabetic character rectangles as an alphabetic block 
based on the distance between each alphabetic character rectangle for which it asked with the alphabetic 
character rectangle extract means. 

[Claim 4] An alphabetic block rectangle extract means is table recognition equipment according to claim 
3 characterized by finding the distance between each alphabetic character rectangle for which it asked 
with the alphabetic character rectangle extract means, and performing processing which unifies the 
alphabetic character rectangle group which continued in a distance smaller than a certain threshold as 
one alphabetic block. 

[Claim 5] The alphabetic character and a ruled line separation means by which said alphabetic block 
extract means separates the alphabetic character and ruled line in a table, A ruled line vectorization 
means to vectorize the ruled line separated with the alphabetic character and the ruled line separation 
means. An alphabetic character field extract means to extract the rectangle field where the alphabetic 
character should be written based on the vector data of the ruled line obtained by the ruled line 
vectorization means as an alphabetic character field, Table recognition equipment according to claim 1 
characterized by having an alphabetic block rectangle extract means to unify one or more alphabetic 
character rectangles as an alphabetic block based on the distance between each alphabetic character 
rectangle for which it asked with the alphabetic character rectangle extract means. 
[Claim 6] Said physical relationship discernment means is table recognition equipment according to 
claim 1 characterized by having a line sampling means to consider that the alphabetic block rectangle 
extracted by the alphabetic block extract means is the configuration frame of a table, and to identify the 
list of the line writing direction of the configuration frame, and a train extract means to identify the list 
of the direction of a train of said configuration frame. 

[Claim 7] Said line sampling means is table recognition equipment according to claim 6 which extracts 
the group of a configuration frame with the y-coordinate of tiie core of each configuration frame same in 
a predetermined error range as the same line, and is characterized by said train extract means being what 
extracts the group of a configuration frame with the x-coordinate of the core of each configuration frame 
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same in a predetermined error range as the same train. 

[Claim 8] The table recognition equipment characterized by to have a physical-relationship discernment 
means identify the physical relationship between alphabetic blocks extracted by rectangle frame extract 
means extract the rectangle frame surrounded by the ruled line which constitutes a table from a target 
table field, alphabetic block extract means extract an alphabetic block from the target table field, and the 
rectangle frame extracted by said rectangle frame extract means and said alphabetic block extract means. 

[Claim 9] Table recognition equipment accordmg to claim 8 characterized by to establish the alphabetic 
character and a ruled line separation means divide a table image into an alphabetic character image and a 
ruled line image, to input into said alphabetic block extract means the alphabetic character image 
separated by said alphabetic character and ruled line separation means, and to input into said rectangle 
extract means the ruled line image separated by said alphabetic character and ruled line separation 
means. 

[Claim 10] A ruled line vectorization means to change into vector data the ruled line image which 
separated said rectangle frame extract means with the alphabetic character and the ruled line separation 
means, 1st rectangle frame extract means to ask for a rectangle frame based on the connection relation of 
the ruled line vector outputted by the ruled line vectorization means. Table recognition equipment 
according to claim 8 characterized by having the 2nd rectangle frame extract means which presumes the 
rectangle frame with which some ruled lines were omitted from the ruled line vector by which the end is 
connected to neither of other ruled line vectors. 

[Claim 1 1] Said alphabetic block extract means is table recognition equipment according to claim 8 
characterized by having an alphabetic character rectangle extract means to ask for the rectangle field 
surrounding the lump of a pixel with which the alphabetic character is written, and an alphabetic block 
rectangle extract means to find the distance between each alphabetic character rectangle for which it 
asked with the alphabetic character rectangle extract means, and to unify one or more alphabetic 
character rectangles as an alphabetic block based on the distance. 

[Claim 12] An alphabetic character field extract means by which said alphabetic block extract means 
extracts the rectangle field where the alphabetic character should be written as an alphabetic character 
field based on the output of a rectangle frame extract means, An alphabetic character rectangle extract 
means to ask for the rectangle field which surrounds the lump of a pixel with which the alphabetic 
character is written to each alphabetic character field for which it asked with the alphabetic character 
field extract means, Table recognition equipment according to claim 8 characterized by having an 
alphabetic block rectangle extract means to unify one or more alphabetic character rectangles as an 
alphabetic block based on the distance between each alphabetic character rectangle for which it asked 
with the alphabetic character rectangle extract means. 

[Claim 13] A configuration frame discernment means to identify the configuration frame which 
constitutes a table from a rectangle frame which consists of ruled lines of the table which extracted said 
physical relationship discernment means with said rectangle extract means, and an alphabetic block 
rectangle extracted by the alphabetic block extract means. Table recognition equipment according to 
claim 8 characterized by having a train extract means to identify the list of the direction of a train of the 
configuration frame which constitutes the table extracted with a line sampling means to identify the list 
of the line writing direction of the configuration frame which constitutes the table extracted with the 
configuration frame discernment means, and the configuration frame discernment means. 
[Claim 14] Said configuration frame discernment means is table recognition equipment according to 
claim 13 characterized by performing processing which determines two or more of the alphabetic blocks 
as a configuration frame, respectively when the alphabetic block of a rectangle within the limit is 
extracted and there are two or more alphabetic blocks about the rectangle frame extracted with said 
rectangle extract means, and determines a rectangle frame as a configuration frame when there is a 
single alphabetic block. 

[Claim 15] Said line sampling means is table recognition equipment according to claim 13 which 
extracts the group of a configuration frame with the y-coordinate of the core of each configuration frame 
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same in a predetermined error range as the same line, and is characterized by said train extract means 
being what extracts the group of a configuration frame with the x-coordinate of the core of each 
configuration frame same in a predetermined error range as the same train. 



[Translation done.] 
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* NOTICES * 

JPO and NCI PI are not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2.**** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Industrial Application] This invention relates to the table recognition equipment which recognizes the 

structure of a table from a table image in the field of a document image processing. 

[0002] 

[Description of the Prior Art] The method using the marginal distribution of a table field as a method of 
tiae conventional table recognition and the ruled line which constitutes a table are changed into a part for 
the vector line, and the method which extracts the rectangle frame surrounded by the ruled line is 
learned. There is a thing given [ for example, ] in a JP,2-61775,A official report as a method which Uses 
marginal distribution, and there is a thing given [ for example, ] in a JP, 1-1 2935 8, A official report as a 
method which uses a part for the vector line. 

[0003] The method which uses marginal distribution given in a JP,2-61775,A official report takes the 
marginal distribution of the image of a table field, presumes the location of a ruled line from the crest 
which has the height beyond a certain threshold from the histogram of the marginal distribution, and 
takes out the ruled line of the outer frame the location of a ruled line is in the outermost part of a table. 
Next, it asks for the ruled line which touches this outer frame in ends, and that ruled Une divides an 
outer frame into two or more rectangle frames. Furthermore, the rectangle frame surrounded by the ruled 
line is extracted by performing same processing recursively to within the limit [ each / rectangle ] which 
was divided. A method given [ latter ] in a JP,1-129358,A official report recognizes a table by 
investigating the physical relationship of each rectangle frame which pursued and took out a part for the 
vector line. 

[0004] Although premised on these methods not having an abbreviation in the ruled line which 
constitutes a table, the table actually used into a document also has fairly many to which a part of ruled 
line is abbreviated. Even when the ruled line of both the sides of a table is omitted in the method given 
in a JP,2-264386,A official report, it is the method which can take out a rectangle frame correctly. That 
is, when it distinguishes whether a ruled line is in both the sides of a table from the vertical ruled line 
taken out from the table image, and a horizontal ruled line and there is nothing, it is the method which 
generates a vertical ruled line virtually by both side of a table. 
[0005] 

[Problem(s) to be Solved by the Invention] Conventionally, the thing of various gestalten is shown in the 
table currently used into the document. Drawing 2 shows the example and, as for the table with which, 
as for the table of this drawing (a), all the ruled lines gathered, and (b), all the ruled lines of the table 
with which the table with which the ruled line of both the sides was omitted, (c), and (d) have the 
vertical ruled line currently omitted besides the ruled line of both the sides and a horizontal ruled line, 
and (e) are the omitted tables. Among these, although it can respond by the Prior art about each table of 
(a) and (b) When there are the vertical ruled line and horizontal ruled line which are omitted besides the 
ruled line of both the sides as shown in the table of (c) and (d), and when [ as shown in the table of (e), ] 
all the ruled lines are omitted, structure of a table has not been recognized to accuracy, and a character 
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string was not able to be taken out in the unit which is meaningful as a table. Even if the object of this 
invention is the table which has an abbreviation in a part or all of a vertical ruled line and a horizontal 
ruled line, it is to offer the table recognition equipment which can cut down to accuracy each frame 
which constitutes a table. 
[0006] 

[Means for Solving the Problem and its Function] The table recognition equipment of this invention 
identifies the physical relationship between alphabetic blocks extracted by alphabetic block extract 
means (1 1 of drawing 1 , 81 of drawing 8 ) to extract an alphabetic block from a table image, and said 
alphabetic block extract means, and is equipped with a physical relationship discernment means (12 of 
drawing 1 , 82 of drawing 8 ) to output the data showing the structure of a table, as a fundamental 
configuration. According to this invention, the physical relationship between alphabetic blocks extracted 
by the alphabetic block extract means is identified with a physical relationship discernment means. 
Since the alphabetic block in a table is in the physical relationship which generally aligned regularly as a 
component of a table, the structure of a table can be recognized by seeing the physical relationship 
between alphabetic blocks. Since it was asking for the frame which constitutes a table only paying 
attention to the ruled line of a table conventionally, there was a problem that structure of a table which 
has an abbreviation in a part or all of a vertical ruled line and a horizontal ruled line could not be 
recognized to accuracy, but since the structure of a table is recognized using the list of an alphabetic 
block according to this invention, the problem is solvable. 

[0007] If this invention is caused like 1 voice, it will set in the aforementioned fundamental 
configuration. Said alphabetic block extract means An alphabetic character rectangle extract means to 
ask for the rectangle field surrounding the lump of a pixel with which the alphabetic character is written 
(111 of drawing 1 ), The distance between each alphabetic character rectangle for which it asked with 
the alphabetic character rectangle extract means was found, and it has an alphabetic block rectangle 
extract means (1 12 of drawing 1 ) to unify all alphabetic character rectangles smaller than a threshold 
with the distance as one alphabetic block. The threshold investigates and determines statistics of the 
distance between the whole alphabetic character rectangle, or it is carried out like what the % on the 
basis of the width of character, and it should just determine it. 

[0008] According to other modes of this invention, it sets in the aforementioned fundamental 
configuration. Said alphabetic block extract means A ruled line vectorization means to separate the 
alphabetic character and ruled line in a table, and to vectorize a ruled line (81 1 of drawing 8 ), An 
alphabetic character field extract means to extract the rectangle field where the alphabetic character 
should be written based on the vector data of the ruled line obtained by the ruled line vectorization 
means as an alphabetic character field (812 of drawing 8 ), An alphabetic character rectangle extract 
means to ask for the rectangle field which surrounds the lump of a pixel with which the alphabetic 
character is written to each alphabetic character field for which it asked with the alphabetic character 
field extract means (813 of drawing 8 ), The distance between each alphabetic character rectangle for 
which it asked with the alphabetic character rectangle extract means was found, and it has an alphabetic 
block rectangle extract means (814 of drawing 8 ) to unify all alphabetic character rectangles smaller 
than a certain threshold as one alphabetic block. This is the thing of a configuration of having added the 
ruled line vectorization means and the alphabetic character field extract means to the alphabetic block 
extract means explained in the front paragraph (0007). Since according to this mode it asks for a ruled 
line with a ruled line vectorization means, each alphabetic character field where the field across which 
the ruled line faced with the alphabetic character field extract means is investigated, and an alphabetic 
character should be written is grasped and the alphabetic character rectangle was extracted in each of 
that alphabetic character field, an alphabetic block can be extracted with a sufficient precision. 
[0009] According to other modes of this invention, it sets in the aforementioned fundamental 
configuration. Said physical relationship discernment means A line sampling means to use as a 
configuration frame the alphabetic block rectangle extracted by alphabetic block extract processing, and 
to identify the list of the line writing direction of the configuration frame (121 of drawing 1 , 821 of 
drawing 8 ), It has a train extract means (122 of drawing 1 , 822 of drawing 8 ) to identify the list of the 
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direction of a train of the configuration frame which constitutes the table extracted with the 
configuration frame discernment means. Moreover, it sets to the pan at a concrete mode, and said line 
sampling means is constituted so that the group of a configuration frame with the y-coordinate of the 
core of each configuration frame same in a predetermined error range may be extracted as the same line, 
and said train extract means is constituted so that the group of a configuration frame with the x- 
coordinate of the core of each configuration frame same in a predetermined error range may be extracted 
as the same train. Since the alphabetic block in a table is generally arranged in accordance with the line 
and train of a table, the list of a line and the direction of a train can be investigated by the ability using 
an alphabetic block rectangle as a configuration frame in this way, and the component of table structure 
can be extracted by carrying out grouping to a line and a train. 

[0010] Furthermore, in other modes of this invention, in the aforementioned fiindamental configuration, 
a rectangle frame extract means to extract the rectangle frame fiirther surrounded by the ruled line is 
established, the alphabetic block in the rectangle frame surrounded by the ruled line in the physical 
relationship discernment means and a table is treated equally, and each physical relationship is 
identified. Namely, an alphabetic block extract means by which this table recognition equipment 
extracts an alphabetic block from a table image (1 13 of drawing 1 1 ), A rectangle frame extract means to 
extract the rectangle frame surrounded by the ruled line which constitutes a table from a table image 
(1 12 of drawing 1 1 ), The physical relationship between alphabetic blocks extracted by the rectangle 
frame extracted by said rectangle frame extract means and said alphabetic block extract means was 
identified, and it has a physical relationship discernment means (1 14 of drawing 1 1 ) to create the data 
showing the structure of a table. Since it is identified that it is one component in a table as an alphabetic 
block even if it is a frame in the table which is not surrounded by the ruled line by treating equally the 
alphabetic block in the rectangle frame surrounded by the ruled line of a table, and a table, the table of 
(c), (d), and (e) can also be recognized to accuracy as well as the table of (a) in drawing 2 , and (b). 
[001 1] A ruled line vectorization means by which a rectangle frame extract means changes a ruled line 
image into vector data in the above-mentioned invention (1 121 of drawing 8 ), 1st rectangle frame 
extract means to ask for a rectangle frame based on the connection relation of the ruled line vector 
outputted by the ruled line vectorization means (1 121 of drawing 1 1 ), It has the 2nd rectangle frame 
extract means (1 123 of drawing 1 1 ) which extracts the rectangle frame with which some ruled lines 
were omitted from the ruled line vector by which the end is connected to neither of other ruled line 
vectors. 

[0012] According to the one mode, in the above-mentioned invention, a physical relationship 
discernment means A configuration frame discernment means to identify the configuration frame which 
constitutes a table from a rectangle frame which consists of ruled lines of the table extracted with said 
rectangle extract means, and an alphabetic block rectangle frame extracted by alphabetic block extract 
processing (1141 of drawing 1 1 ), It has a train extract means (1 143 of drawing 1 1 ) to identify the list 
of the direction of a train of the configuration frame which constitutes the table extracted with a line 
sampling means (1 142 of drawing 1 1 ) to identify the list of the line writing direction of the 
configuration frame which constitutes the table extracted with the configuration frame discernment 
means, and the configuration frame discernment means. Moreover, in a concrete mode, about the 
rectangle frame extracted with said rectangle extract means, the configuration frame discernment means 
determines two or more of the alphabetic blocks as a configuration frame, respectively, when the 
alphabetic block of a rectangle within the limit is extracted and there are two or more alphabetic blocks, 
and when there is a single alphabetic block, it determines a rectangle frame as a configuration frame. 
Thus, even if there is a rectangle frame of the ruled line by which the ruled line is omitted by 
determining a configuration frame (recognition) in the part as shown in (d) of drawing 2 , the 
configuration frame which is the component of a table can be determined as accuracy. 
[0013] 
[Example] 

(The 1st example) Drawing 1 is drawing showing the configuration of the 1st example of this invention. 
The table recognition equipment of this example investigates the arrangement condition of an alphabetic 
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block which consists of a series of alphabetic characters, recognizes the structure of a table, and is 
equipped with the physical-relationship discernment section 12 which obtains the data which identify 
the physical relationship between alphabetic blocks extracted by the alphabetic block extract section 1 1 
which extracts an alphabetic block from the alphabetic character image in a table image as shown in 
drawing 1 , and the alphabetic block extract section 11, and express the structure of a table. 
[0014] The alphabetic block extract section 1 1 finds the distance between each alphabetic character 
rectangle for which it asked in the alphabetic character rectangle extract processing section 1 1 1 which 
asks for the rectangle field surrounding the lump of a pixel with which the alphabetic character is 
written, and its alphabetic character rectangle extract processing section 111, and consists of the 
alphabetic block rectangle extract processing section 1 12 which unifies an alphabetic character rectangle 
smaller than a threshold with the distance as an alphabetic block. Moreover, the physical relationship 
discernment section 12 consists of the line sampling processing section 121 which identifies the list of 
the line writing direction of reception and its configuration frame for the alphabetic block rectangle 
extracted by alphabetic block extract processing as a configuration frame, the train extract processing 
section 122 which identifies the list of the direction of a train of a configuration frame, and the table 
structure storage section 123 which memorizes the discernment result of physical relationship. 
[0015] Processing of each part of this example constituted as mentioned above is explained to a detail. 
The image made into the object of processing by this example is a table image which the table field was 
separated and was obtained from the document image containing the table inputted by picture input 
devices, such as an image scanner. The separation means of a table field has what an operator specifies 
with a pointing device like a mouse on a screen, the table field decollator (for example, refer to JP,2- 
210586,A) automatically separated based on the attribute of an image, and all are well-known 
techniques. From the alphabetic character image part in a table image, the alphabetic character rectangle 
extract processing section 1 1 1 asks for the rectangle fields 35, 36, 37, and 38 surrounding the lumps 31, 
32, 33, and 34 of a pixel with which the character is written, as shown in (a) of drawing 3 , and (b). That 
is, when the pixel value of 0, and an alphabetic character/line is written [ the image of a table ] for the 
pixel value of a background by 1, the lump whose pixel value is 1 is taken out and it asks for the 
rectangle field. When two rectangle fields have lapped at this time, it expresses in the rectangle field 39 
which can include two rectangle fields 37 and 38 as shown in (b) of drawing 3 . In addition, since the 
method of extracting the rectangle field of an alphabetic character is a technique (for example, refer to 
JP,2-267678,A) known well, detailed explanation is omitted. 

[0016] Furthermore, in the alphabetic block rectangle extract processing section 1 12, the distance 
between each alphabetic character rectangle for which it asked in the alphabetic character rectangle 
extract processing section 1 12 is found, and processing which unifies all alphabetic character rectangles 
smaller than a certain threshold as one alphabetic block is performed. The threshold used by this 
processing may investigate and determine statistics of the distance between the whole alphabetic 
character rectangle, may determine it as several% of a graphic size, and is not defined about the decision 
approach of a threshold especially here. The result when applying to a table without the ruled line which 
shows this processing to (a) of drawing 4 becomes as it is shown in this drawing (b). An identifier is 
given to each and the rectangle frame of the alphabetic block obtained as a result of these processings is 
accumulated in memory with the location (an x-coordinate, y-coordinate) of a rectangle frame, width of 
face, height, etc, proper as data. 

[0017] Drawing 5 a and drawing 5 b are drawings showing the flow of processing of the alphabetic 
block rectangle extract processing section 1 12. Drawing 5 a shows the procedure for calculating said 
threshold for summarizing an alphabetic character rectangle to a block. The variable sumh which stores 
the total result of the height of the variable sumw which stores the total result of the width of face of a 
constant N and an alphabetic character rectangle, and an alphabetic character rectangle as the storing 
section which stores a constant and the intermediate result required for processing, width of face and the 
thresholds Tw and Th of height, and Variable i are prepared. First, the total of the alphabetic character 
rectangle extracted in the alphabetic character rectangle extract processing section 1 1 1 is set to N as 
mitial setting, and sumw, sumh, and i are set as 0, respectively (step 501). And it judges whether i is 
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over N (step 502), and the width of face of the alphabetic character rectangle Ci and height are added to 
sumw and sumh at the time of i<N (step 503), and it ** the aggregate value by 2 Ns (step 504). And 
making i increase by every [ 1 ], processing of steps 502-505 is repeated until i becomes larger than N 
(step 505). When i becomes larger than N, the thresholds Tw and Th of width of face and height are 
acquired as one half of values of the average of the width of face of an alphabetic character rectangle. 
[0018] If a threshold is acquired by processing of drawing 5 a, processing which sunmiarizes an 
alphabetic character rectangle to a block by processing of drawing 5 b will be performed. B is set as 
variables j and 0 (step 506). It judges whether the alphabetic character rectangle Cj is registered to one 
of the alphabetic blocks CB (step 507). In order to process the following alphabetic character rectangle, 
only 1 makes Variable j increase, if registered (step 517). If the alphabetic character rectangle Cj has not 
been registered yet as a result of the judgment of step 507, the alphabetic character rectangle Cj will be 
registered into an alphabetic block CBB (step 508), This registered alphabetic character rectangle Cj 
turns into an alphabetic character rectangle of the head of one alphabetic block CBB. Next, processing 
registered into an alphabetic block CBB in search of the alphabetic character rectangle which has the 
registered alphabetic character and distance in the distance within a threshold Tw or Th is performed. 
Therefore, Variable k is first set as j (step 509). And it investigates whether the alphabetic character 
rectangle Ck is registered to one of alphabetic blocks (step 510). If not registered, the distance D of CBB 
and Ck will be found (step 51 1). It investigates whether the found distance D is in the distance within a 
threshold Tw or Th (step 512). If distance D suits within the limits of a threshold Tw or Th, the 
alphabetic character rectangle Ck will be added to an alphabetic block CBB, and the magnitude of CBB 
will be changed (step 513). When judged with the alphabetic character rectangle Ck being registered at 
step 510, When judged with there being no distance D within the limits of a threshold Tw or Th at step 
512, and when [ in step 513 ] additional processing is finished In order to look for the following 
alphabetic character rectangle, it is set as k=k +1 (step 514), and after judging whether the processing to 
all alphabetic character rectangles finished (step 515), when processing has not finished yet, processing 
of steps 510-514 is repeated about the set-up following alphabetic character rectangle. While 
considering as B=B +1 by the judgment of step 515 in order to ask for the following alphabetic block 
when it stops being k<N (step 516), it considers as j=j +1 (step 517). Between j<N, processing of step 
507 - step 518 is continued, and processing is ended when it stops being j<N (step 519). 
[0019] Next, work of the physical relationship discernment section 12 is explained. The physical 
relationship discernment section 12 consists of the three processing sections of the line sampling 
processing section 121 and the train extract processing section 122 as mentioned above, and explains 
order later on below. In this example, the alphabetic block extracted in the alphabetic block extract . 
section 1 1 is registered as a configuration frame as it is. (c) of drawing 4 shows a configuration frame. 
[0020] In the line sampling processing section 121 and the train extract processing section 122, it 
considers that the alphabetic block rectangle extracted in the alphabetic block rectangle extract 
processing section 121 is the configuration frame which constitutes a table, and those lists are identified. 
Drawing 6 a and drawing 6 b are drawings in which flow and drawing 7 a and drawing 7 b of line 
sampling processing show the flow of train extract processing. As shown in this drawing, the coordinate 
of the central point of all configuration frames is searched for, by line sampling processing, the 
configuration frame located in a line in error range with the Y coordinate of the central point of a 
configuration frame is discriminated from the line of a table, and the configuration frame located in a 
line in error range with the X coordinate of the central point of a configuration frame is discriminated 
from the train of a table by train extract processing. 

[0021] That is, in line sampling processing, as shown in drawing 6 a and drawing 6 b, the total of a 
configuration frame is first set as Variable N (step 601). It asks for the Y coordinate of the central point 
of all configuration frames, and stores in Array CB (step 602). It searches for a tiling with tiie greatest 
height in all rectangle frames, and let one half of the height be the value of the threshold Th of error 
range (step 603). Next, the array CB of tiie Y coordinate of an alphabetic block is sorted in ascending 
order (step 604). And it is set as i=G=0 and y=CBi and a line array is cleared (step 605). Next, from 
Array CB, to the configuration frame CBi which is not judged and (step 607) registered, if, whether it is 
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registered in ejection and a line array in Y coordinate CBi of one configuration frame When it judges by 
the operation of |CBi-y|<Th whether distance with y is in the range within a threshold Th (step 608) and 
distance with y is in the range within a threshold Th about it, the configuration frame corresponding to 
CBi is stored in a line array (step 609). When an alphabetic block is registered, and when there is no 
distance with y in the range within a threshold Th, in order to take out the following alphabetic block, it 
considers as i=i +1 (step 610). Judgment to the same line array and registration processing (steps 607- 
609) are performed to the taken-out new alphabetic block. If processing progresses and it stops being 
i<N (judgment of step 606), the extract processing to one line is completed, and in order to perform 
extract processing of the following line, it will progress to the flow of drawing 6 b. The content of the 
line array is outputted as Gth line information (step 611). Next, while clearing a line array, it is set as i= 
0 and G=G +1 (step 612). And the configuration frame which should serve as a head of the G+lst lines 
is looked for. That is, it judges from the beginning whether it is registered one [ at a time ] in the line of 
one of ejection in a configuration frame (step 614), and while specifying the configuration frame which 
is not registered [ which was found first ] as a configuration frame y which should serve as a head of the 
G+lst lines, it is set as i= 0 (step 616), and moves to the extract processing of one line of steps 606-610 
of drawing 6 a. In addition, when judged with it not being i<N in step 613 (i.e., when a non-registered 
configuration frame is lost), extract processing of a line is ended. 

[0022] Train extract processing is as being shown in drawing 7 a and drawing 7 b, and with line 
sampling processing, if the point that substitution requires a row and column is removed, it will perform 
almost same processing. That is, the X coordinate of the central point of all configuration frames is 
stored in Array CB (step 702), and it sorts in ascending order (step 704). One half of the greatest width 
of face in all rectangle frames is made into the value of the threshold Tw of error range (step 703), and it 
is i=G=0. It is set as x=CBi and a line array is cleared (step 705). Next, when it judges whether the 
distance of x is in the range within the threshold Tw of error range about every one configuration frame 
CBi stored in Array CB to ejection and the configuration frame CBi which is not registered if (steps 
707-708) and being suited in error range, the configuration frame corresponding to CBi is stored in a 
train array (step 709). If the extract processing to one train is completed, in order to perform extract 
processing of the foUovsdng train, it will progress to the flow of drawing 7 b. Next, the configuration 
frame of the beginning which is not registered [ which should serve as a head of the following train ] is 
looked for (steps 714-715), and if found, it will move to extract processing of one train of steps 706-710 
of drawing 7 a. When a non-registered configuration frame is lost, extract processing of a train is ended. 
[0023] As shown in (d) of drawing 4 , and (e), grouping of the configuration frame is carried out to a 
row and column by processing of the line sampling processing section 121 and the train extract 
processing section 122. The output data are memorized by the table structure storage section 123 in the 
format which gave the line number and the row number to the identification number showing for 
example, a configuration frame, and will be in an available condition, the system, for example, the word 
processor, of arbitration. 

[0024] As explained above, this 1st example uses as the configuration frame of a table the alphabetic 
block extracted in the alphabetic block extract section 1 1, and since it extracts the structure of the table 
which consists of a line and a train by that list, even if it is a table without a ruled line as shovra in (a) of 
drawing 4 , it can recognize table structure. In addition, in tiie case of a table with a ruled line, only 
based on an alphabetic block, table structure can be similarly recognized according to this 1st example. 
[0025] (The 2nd example) Drawing 8 is drawing showing the configuration of tiie 2nd example of this 
invention. The table recognition equipment of this example extracts the alphabetic character field in a 
table based on a ruled line. It is what extracts an alphabetic block in the alphabetic character field, 
investigates the arrangement condition of the extracted alphabetic block and recognizes the structure of a 
table. The alphabetic block extract section 81 which extracts an alphabetic block from a table image like 
the configuration of the 1st example shown in drawing 1 , It has the basic configuration which consists 
of the physical relationship discernment section 82 which generates the data which identify the physical 
relationship between alphabetic blocks extracted by the alphabetic block extract section 81, and express 
table structure. And tiie means for tfiis 2nd example to ask for tiie alphabetic character field where tiie 
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configuration of the alphabetic block extract section 81 becomes the preceding paragraph of the 
alphabetic character rectangle extract processing section 813 from the ruled line vectorization processing 
section 811 and the alphabetic character field extract processing section 812 unlike the 1st example is 
added. The ruled line vectorization processing section 81 1 separates the alphabetic character and ruled 
line in a table, and vectorizes a ruled line. Moreover, the alphabetic character field extract processing 
section 812 extracts the rectangle field v^here the alphabetic character should be v^itten based on the 
vector data of the ruled line obtained by the ruled line vectorization processing section 81 1 as an 
alphabetic character field. 

[0026] It is necessary to divide into the part of a line and the part of an alphabetic character which 
constitute a table to vectorize only the ruled line of a table. This separation processing can use the same 
existing technique as the alphabetic character in a graphic form, and the processing which separates a 
segment. In addition, when an applicant for this patent uses the technique of Japanese Patent Application 
No. No. 290299 [ three to ] "an alphabetic character / graphic form decollator" (artificer Noboru 
Shimizu) which carried out patent application previously, little exact separation processing of an error 
can be performed more at a high speed. Its alphabetic character / graphic form decollator is explained 
briefly. This is equipped with the field judging section 93 which judges a field by clustering using the 
feature-extraction section 91 which extracts two or more descriptions of each black pixel limip in an 
input image, the initial cluster core decision section 92 which searches for an initial cluster core using 
the feature-extraction result of the feature-extraction means 91, and the feature-extraction result of the 
feature-extraction section 91 and the decision result of the initial cluster core decision section 92, as 
shown in drawing 9 . As each black pixel lump's characteristic quantity, the complexity of a black pixel 
lump's area, oblateness, and a border line etc. can be used, for example. If such characteristic quantity is 
extracted in the feature-extraction section 91, the initial cluster core decision section 92 will search for 
the core of an initial cluster using distribution of a black pixel lump's extracted characteristic quantity 
next. The field judging section 93 judges the field which should cluster to two or more extracted 
characteristic quantity of a black pixel lump using the initial cluster core searched for by the initial 
cluster core decision section 92, and each black pixel lump should do a group. 
[0027] The field of the ruled line of the separated table changes a binary image into the vector data 
which makes the focus, such as an endpoint, the polygonal line, a crossing, and a jimction, the starting 
point and a terminal point. Since the approach of changing into this vector data should just use the 
existing technique (for example, reference, such as Shingaku Giho [ PRL / PRL, PRL / 85-24 /, and / 86- 
89 ] 83-8, JP,2-210586,A, and JP,2-105265,A), it omits explanation here. 

[0028] Drawing 10 a and drawing 10 b are drawings showing the flow of extract processing of the 
alphabetic character field extract processing section 812. The ruled line vector acquired in the ruled line 
vectorization processing section 81 1 is divided into the vertical ruled line VR and the horizontal ruled 
line HR (step 1001), each is counted, the number of vertical ruled lines is stored in V, and a horizontal 
ruled line is stored in H (step 1002). If the existence of a horizontal ruled line is judged (step 1003) and 
there is no horizontal ruled line, the number R of alphabetic character fields will be set as 1, and let area 
size be the magnitude of an input image (step 1009). If there is a horizontal ruled line, the number R of 
alphabetic character fields will be set as H-1, and i will be set as 0 (step 1004). Next, a horizontal ruled 
line is sorted in ascending order of a Y coordinate (step 1005). And from the small order of a Y 
coordinate, a number is assigned to an alphabetic character field and it goes to it. That is, let the field 
divided by the i-th horizontal ruled line and the i+lst horizontal ruled lines be the i-th alphabetic 
character field (step 1007). If it stops being i<R (step 1006), allotment of a number will finish and it will 
move to processing of the alphabetic character field by the ruled line of the perpendicularly it is shown 
in drawing 10 b. 

[0029] If the existence of a vertical ruled line is judged (step 1010) and there is no vertical ruled Une, the 
number R of alphabetic character fields will be set as 1, and let area size be the magnitude of an input 
image (step 1018). If there is a vertical ruled line, the content of the number R of alphabetic character 
fields by the horizontal ruled line for which it asked by processing of drawing 10 a will be moved to Rl, 
R+V -1 will be set as R, and i, j, and k will be set as 0, respectively (step 101 1). Next, a vertical ruled 
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line is sorted in ascending order of an X coordinate (step 1012). And it goes in quest of the field which 
was divided by the horizontal ruled line and which was divided by the vertical ruled line for every field 
(steps 1014-1017). That is, let the field divided by the i-th and the i+lst vertical ruled lines be the k-th 
alphabetic character field about the j-th field divided by the horizontal ruled line (step 1013). 
Successively, this numbering is repeated until it judges i and k as the unsettled field divided into the j-th 
field by the vertical ruled line having been lost, increasing every [ 1 ] (steps 1014 and 1015). And m 
order to process the field divided by the following horizontal ruled line, while only 1 makes j increase, i 
is cleared to 0. And steps 1013-1017 are repeated until processing finishes about the thing of the last of 
the field divided by the horizontal ruled line (i.e., until it is judged with having stopped being j<Rl). The 
alphabetic character field divided by the ruled line as mentioned above is extracted, and the result is 
passed to the alphabetic character rectangle extract processing section 813. 

[0030] Actuation of the processing section after the alphabetic character rectangle extract processing 
section 813 is fimdamentally the same as the 1st example. However, alphabetic character rectangle 
extract processing and alphabetic block extract processing are performed using the information on the 
alphabetic character field extracted by the alphabetic character field extract processing section 812. 
Therefore, while the extract of an alphabetic character rectangle becomes easy and moreover becomes 
certain, the error which detects the alphabetic character which is close on both sides of a ruled line also 
about the alphabetic block as one block is lost, and an alphabetic block can be extracted certainly. 
[0031] (The 3rd example) Drawing 1 1 is the block diagram showing the 3rd example of this invention. 
The alphabetic character and the ruled line separation processing section 1110 which separates the 
alphabetic character part by which the table recognition equipment of this example is contained in a 
table image, and a ruled line part. The rectangle frame extract section 1 120 which extracts the rectangle 
frame surrounded by the ruled line which constitutes a table fi:om a ruled line image separated by an 
alphabetic character and the ruled line separation processing section 1110, The alphabetic block extract 
section 1 130 which extracts the alphabetic block rectangle frame which constitutes a table fi-om an 
alphabetic character image separated by an alphabetic character and the ruled line separation processing 
section 1110, The physical relationship discernment section 1 140 which creates the data which identify 
the physical relationship between alphabetic blocks extracted by the rectangle frame and the alphabetic 
block extract section 1 130 which were extracted by the rectangle frame extract section 1 120, and 
express the structure of a table. It has the table structure storage section 1 144 which memorizes the data 
showing the structure of the table identified by the physical relationship discernment section 1 140. 
[0032] The ruled line vectorization processing section 1121 in which the rectangle fi-ame extract section 
1 120 vectorizes a ruled line image, The perfect rectangle fi-ame extract processing section 1 122 which 
extracts the perfect rectangle fi-ame surrounded by the ruled line vector based on the ruled line vector 
which the ruled line vectorization processing section 1121 outputs. It has the imperfection rectangle 
fi-ame extract processing section 1 123 which compensates the place which has not extracted enough the 
imperfect rectangle fi-ame which a part of ruled line is omitted and does not have some rectangle fi-ames, 
and is used as a rectangle frame, 

[0033] The alphabetic block extract section 1 130 is equipped with the character field extract processmg 
section 1 131, the alphabetic character rectangle extract processing section 1131, and the alphabetic 
block rectangle extract processing section rectangle extract section 1 1 . The alphabetic character field 
extract processing section 1131 determines the field surrounded with the rectangle frame obtained by the 
rectangle frame extract section 1 120 as an alphabetic character field, respectively, cuts down the 
alphabetic character image from an alphabetic character and the ruled line separation processing section 
1 1 10 for every alphabetic character field, and passes it to the alphabetic character rectangle extract 
processing section 1 132. The alphabetic character rectangle extract processing section 1 132 asks for the 
rectangle field surrounding the lump of a pixel with which the alphabetic character is written. The 
alphabetic block rectangle extract processing section 1 133 finds the distance between each alphabetic 
character rectangle for which it asked in the alphabetic character rectangle extract processmg section 
1 132, and unifies all alphabetic character rectangles smaller than a threshold with the distance as one 
alphabetic block. 
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[0034] Moreover, the physical relationship discernment section 1 140 identifies the physical relationship 
between alphabetic blocks extracted by the rectangle frame extracted by the rectangle frame extract 
section 1 120 and said alphabetic block extract 113, and consists of the configuration frame discernment 
processing section 1141, the line sampling processing section 1 142, the train extract processing section 
1 143, and the table structure storage section 1 141 that memorizes those extract results. The 
configuration frame discernment processing section 1141 identifies the configuration frame which 
constitutes a table from a rectangle frame which consists of ruled lines of the table extracted with said 
rectangle extract means, and an alphabetic block rectangle frame extracted by the alphabetic block 
extract means. The line sampling processing section 1 142 identifies the list of the line writing direction 
of the configuration frame which constitutes the table extracted in the configuration frame discernment 
processing section 1141, and the train extract processing section 1 143 identifies the list of the direction 
of a train of the configuration frame which constitutes the table extracted in the configuration frame 
discernment processing section 1141. 

[0035] Actuation of this example constituted as mentioned above is explained. The same existing 
technique as the alphabetic character in a graphic form and the processing which separates a segment 
can be used for an alphabetic character and the ruled line separation processing section 1 1 10. In 
addition, if the technique of Japanese Patent Application No. No. 290299 [ three to ] "an alphabetic 
character / graphic form decollator" mentioned in the 2nd example is used, little exact separation 
processing of an error can be performed more at a high speed. The information on the ruled line image 
separated here is outputted to the rectangle frame extract section 1 120, and the information on an 
alphabetic character image is outputted to the alphabetic block extract section 1 130. 
[0036] A ruled line image is vectorized in the ruled line vectorization processing section 1121. That is, a 
binary image is changed into the vector data which makes the focus, such as an endpoint, the polygonal 
line, a crossing, and a junction, the starting point and a terminal point. The approach of changing into 
this vector data should just use the existing technique shown above. The changed ruled line vector data 
is passed to the imperfection rectangle frame extract processing section 1 123 which compensates the 
place which has not extracted enough the imperfect rectangle frame which the perfect rectangle frame 
extract processing section 1 122 which extracts the perfect rectangle frame surrounded by the ruled line 
vector, and a part of ruled line are omitted, and does not have some rectangle frames, and is used as a 
rectangle frame. 

[0037] The perfect rectangle frame extract processing section 1 122 takes out the perfect rectangle frame 
surrounded by the ruled line vector based on ruled line vector data. Drawing 12 a and 12b are the flow 
charts of the processing. Since vertical vector data connected the rectangle frame of a table to right and 
left of one level vector data and level vector data has connected with the bottom of it fiarther, each level 
vector data is investigated and the vector data which fulfills conditions is entered in the rectangle frame 
configuration table shown in drawing 14 . First, counting of the number of all the vector data that 
constitutes a table is carried out (step 1201). Processing of step 1202 to the following step 1212 is 
applied to all vector data. The level vector data Vi used as a rectangle frame top ruled line is looked for 
(step 1203). This can find level vector data from it being below a threshold with the include angle of 
vector data and a horizontal line to make. Since the level vector data Vi found here may serve as a k-th 
rectangle frame top ruled line, this vector data Vi is registered into the column of the k-th rectangle 
frame top ruled line of the rectangle frame configuration table 141 (step 1204). Next, the vector data 
which constitutes the side on the right-hand side of the rectangle frame Wk is looked for (step 1205). 
That is, processing which finds vertical vector data which has the endpoint of the way which touches the 
endpoint at the right end of vector data Vi, and is not in contact with vector data Vi below vector data Vi 
is performed. It can ask for vertical vector data easily from it being below a threshold with the include 
angle with a perpendicular to make. Since the vector data found at this step may constitute the right 
ruled line of the rectangle frame Wk, it registers with the column of the right ruled line of the k-th 
rectangle frame of the rectangle frame configuration table 141 (step 1206). The left ruled line of the 
rectangle frame Wk is looked for similarly (step 1207), and it registers with the column of the left ruled 
line of the k-th rectangle frame of the rectangle frame configuration table 141 (step 1208). Furthermore, 
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the vector data which touches the right ruled line [ for which it asked now ], and left ruled line bottom is 
found (step 1209), and it registers with the column of the bottom ruled line of the k-th rectangle fi-ame of 
the rectangle frame configuration table 141 (step 1210). When a ruled line does not find at least one of 
the above processings, all registration of the k-tii rectangle frame of the rectangle frame configxiration 
table 141 is canceled, and it resets so that the rectangle frame which consists of other vector data can be 
registered. The rectangle frame configuration table 141 when applying the above processing to the table 
of drawing 13 becomes like drawing 14 . Moreover, in the processing to a table like drawing 15 as other 
examples, the rectangle frame configuration table 141 becomes like drawing 16 . Furthermore, it 
rewrites on the rectangle frame table 171 which expresses the rectangle frame configuration table 141 
with the X coordinate of the upper left comer of each rectangle frame, a Y coordinate, rectangular width 
of face, and height that it is convenient for next processing. The rectangle frame table of drawing 14 
becomes as shown in (a) of drawing 17 . 

[0038] The imperfection rectangle frame extract processing section 1 123 compensates with it the place 
which has not extracted enough the imperfect rectangle frame which a part of ruled line is omitted and 
does not have some rectangle frames, and takes it out as a rectangle frame of a table. Drawing 18 a and 
1 8b are the flow charts of the processing. First, non-registered vectors other than the vector data 
registered into the rectangle frame configuration table as an element of the rectangle frame extracted by 
the perfect rectangle frame extract processing section 1 122 are extracted (steps 1801-1806), Therefore, 
first, the total of vector data is set as N and it clears to i=k=0 (step 1801). If not registered, while it 
investigates whether vector data Vi is registered into ejection and a rectangle frame configuration table 
(step 1803), and registering with the vector train VV (step 1804), counting is carried out with Counter k 
(step 1805). And in order to take out the following vector, it considers as i=i +1 (step 1806). When 
vector data Vi is registered into the rectangle frame configuration table, it moves to processing of the 
following vector as it is (step 1 806). If registration processing of a non-registered vector finishes about 
all vectors when i reaches N namely, (step 1802), the horizontal/vertical vector which connects two 
nearest endpoints within the non-registered vector train VV will be compensated (step 1807). The 
number with which it compensated is set to n. The total k of a vector is made into k+n, and it clears to 
i=m=0 (step 1808). The level vector data VVi used as a rectangle frame top ruled line is looked for (step 
1810). This can find level vector data from it being below a threshold with the include angle of vector 
data and a horizontal line to make. Since the level vector data VVi found here may serve as the m-th 
rectangle frame Wm top ruled line, this vector data VVi is registered into the column of the m-th 
rectangle frame top ruled line of an imperfection rectangle frame configuration table (step 1811). Next, 
the vector data which constitutes the side on the right-hand side of the rectangle frame Wm is looked for 
(step 1812), That is, processing which finds vertical vector data which has the endpoint of the way 
which touches the endpoint at the right end of vector data VVi, and is not in contact with vector data 
VVi below vector data Vi is performed. It can ask for vertical vector data easily from it being below a 
threshold with the include angle with a perpendicular to make. Since the vector data found at this step 
may constitute the right ruled line of the rectangle frame Wm, it registers with the column of the right 
ruled line of the m-th rectangle frame of an imperfection rectangle frame configuration table (step 1813). 
The left ruled line of the rectangle frame Wm is looked for similarly (step 1814), and it registers with the 
column of the left ruled line of the m-th rectangle frame of an imperfection rectangle frame 
configuration table (step 1815). Furthermore, the vector data which touches the right ruled line [ for 
which it asked now ], and left ruled line bottom is found (step 1816), and it registers with the column of 
the bottom ruled line of the m-th rectangle frame Wm of an imperfection rectangle frame configuration 
table (step 1817). When a ruled line does not find at least one of the above processings, all registration 
of the m-th rectangle frame Wm of an imperfection rectangle frame configuration table is canceled, and 
it resets so that the rectangle frame which consists of other vector data can be registered, (b) of drawing 
2Q shows the example of an imperfection rectangle frame configuration table, and this expresses a part 
for the imperfection rectangle frame part of the table of drawing 19 . Furthermore, it rewrites on the 
rectangle frame table which expresses an imperfection rectangle frame configuration table with the X 
coordinate of the upper left comer of each rectangle frame, a Y coordinate, rectangular width of face. 



http://www4.ipdl.ncipi.go.jp/cgi-bin/tran_web_cgi_ejje 



6/16/05 



jp^05-334490,A [DETAILED DESCRIPTION] 



Page 11 of 13 



and height that it is convenient for next processing. 

[0039] Next, processing of the alphabetic block extract section 1 130 is explained. In the alphabetic 
character field extract processing section 1 131, it is divided by the ruled line in a table, the rectangle to 
which the alphabetic character should be written is found, and it registers with an alphabetic character 
field table. What is necessary is just to register this into an alphabetic character field table in this 
example, since the rectangle frame is extracted by the perfect rectangle frame extract section and the 
imperfection rectangle frame extract section. In the example of drawing 19 , the alphabetic character 
field surrounded by two perfect rectangle frames and four alphabetic character fields of an imperfection 
rectangle within the limit are obtained. To the table which runs short of the ruled lines of drawing 21 as 
shown in (a) as other examples, this processing is compensated with a ruled line as shown in (b) of 
drawing 21 , and it extracts the alphabetic character field 21 1 which includes two or more alphabetic 
blocks as shown in (c) of drawing 21 . Next processing advances processing for every alphabetic 
character field for which it asked here. Thus, an alphabetic character field can be obtained and the 
extract of an alphabetic block which straddles a ruled line can be prevented by being made to extract an 
alphabetic block for every alphabetic character field. 

[0040] The next alphabetic character rectangle extract processing 1 132 is explained. Here, it asks for the 
rectangle field surroxmding the lump of a pixel with which the alphabetic character is vmtten from each 
alphabetic character field for which it asked in the alphabetic character field extract processing section 
1131. That is, when the pixel value of 0, and an alphabetic character/line is written [ the image of a 
table ] for the pixel value of a background by 1, the lump whose pixel value is 1 is taken out and it asks 
for the rectangle field. When two rectangle fields have lapped, it expresses in the rectangle field 39 
which can include two rectangle fields 37 and 38 as shown in (b) of drawing 3 . In addition, since the 
method of extracting the rectangle field of an alphabetic character is the existing technique, detailed 
explanation is omitted. 

[0041] Furthermore, in the alphabetic block rectangle extract processing section 1 133, the distance 
between each alphabetic character rectangle for which it asked in the alphabetic character rectangle 
extract processing section 1 132 is found, and processing which unifies all alphabetic character 
rectangles smaller than a certain threshold as one alphabetic block is performed. The detail of the 
processing is the same as processing and the basic target of the alphabetic block rectangle extract 
processing section 1 12 in the 1st example, and is shovm in the flow chart of drawing 5 a and drawing 5 
b. Since this flow chart was already explained in the 1st example, explanation here is omitted. However, 
although it had judged only with said threshold whether it would unify to a block in the case of the 1st 
example, this example is unified to one alphabetic block, only when it is in the same alphabetic 
character field with reference to the information on an alphabetic character field. The extract of an 
alphabetic block which straddles a ruled line by this can be prevented. 

[0042] Finally work of the physical relationship discernment section 1 140 is explained. The physical 
relationship discernment 1 14 consists of the three processing sections, the configuration frame 
discernment processing section 1 141, the line sampling processing 1 142, and the train extract processing 
1 143, further. It is the processing which identifies and chooses whether it is the rectangle frame which 
consists of ruled lines of a table, and whether the frame which constitutes the structure of a table from 
configuration frame discernment processing actually is a frame of an alphabetic block. When an 
alphabetic block is registered as a configuration frame when at least one or more alphabetic blocks 
should exist in the interior of a rectangle frame, counting of the number of the alphabetic blocks in the 
interior of a rectangle frame is carried out and two or more alphabetic blocks are checked, and only one 
alphabetic block exists, a rectangle frame is registered as a configuration frame. 
[0043] Drawing 22 is flow drawing showing the detail of the above-mentioned configuration frame 
discernment processing. The total of a perfect rectangle frame is set as N, the total of an alphabetic block 
is set as M, and the variable s which carries out counting of the alphabetic block contained in the 
variable C which specifies the array element of the variable i which specifies the array element of the 
perfect rectangle frame w, and the identified configuration frame, and each perfect rectangle frame is set 
as 0, respectively (step 2201). Variable j and Variable s which specify the array element of an alphabetic 
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block CBj are set as 0 (step 2202). In judging and (step 2203) including whether an alphabetic block 
CBj is included in ejection and the perfect rectangle frame Wi, it increments s (step 2204), and nothing 
is done when not contained. And j is incremented in order to take out the following alphabetic block 
(step 2205). Processing of the above steps 2204-2205 is repeated until it is judged with the unsettled 
alphabetic block having been lost one by one (step 2206). thus, when finishing investigating all 
alphabetic blocks about one perfect rectangle frame, it judges whether there is two or more several s of 
the alphabetic block contained in the perfect rectangle frame (step 2207), and they are two or more **** 
— s alphabetic blocks contained in the perfect rectangle frame Wi are registered into a case as a 
configuration frame (step 2208). Since s pieces were registered, C is made into C+s (step 2209). On the 
other hand, when several s of the alphabetic block contained in the perfect rectangle frame Wi is 1, the 
perfect rectangle frame Wi is registered as Cth configuration frame (step 2210), and C is incremented 
(step 221 1). By the above processing, about one perfect rectangle frame, if a related configuration frame 
is identified, in order to perform processing same about a perfect rectangle frame next, it will consider as 
i=i +1, and will return to step 2202. All processings will be completed if a judgment (step 2213) that it is 
not i<N is made. An example of a result which extracted the configuration frame to (a) of drawing 23 
and (b) is shown. This drawing (a) shows the example of a table, and the configuration frame which 
extracted (b) from the table of (a). 

[0044] The list of the frame which constitutes the table extracted in the configuration frame discernment 
processing section 1141 from the line sampling processing section 1 142 and the train extract processing 
section 1 143 is identified. That is, the coordinate of the central point of all configuration frames is 
searched for, in the line sampling processing section 1 142, the configuration frame located in a line in 
error range with the Y coordinate of the central point of a configuration frame is discriminated from the 
line of a table, and the configuration frame located in a line in error range with the X coordinate of the 
central point of a configuration frame is discriminated from the train of a table in the train extract 
processing 1 143. The detail of this processing shows a line sampling processing flow to drawing 6 a and 
drawing 6 b, and shows a train extract processing flow to drawing 7 a and drawing 7 b. Detailed 
explanation of these processings is the same as the place explained according to the 1st example. The 
result of the line sampling processing 152 and the train extract processing 153 is shown in drawing 24 
and drawing 25 R> 5. 

[0045] After performing such processing, it is possible to change the table inputted as an image by 
assigning a number in order along the extracted row and column, and describing the data of the table of 
a word processor according to this numbering into the table which can be edited with a word processor. 
Moreover, as shown in the table shown in (a) of drawing 24 by starting an alphabetic character using a 
configuration frame, when both the sides of a table run short of ruled lines, the remainder can extract a 
configuration frame which is an alphabetic block by the rectangle frame by which eye the 2nd train of a 
table and eye the 3rd train consist of a ruled line as shown in (b) of drawing 24 . It also becomes 
possible the table which runs short of lines, and to input into a character reader easily. For example, as 
shown in the table shovm in (a) of drawing 24 , when both the sides of a table run short of ruled lines, 
the remainder can extract a configuration frame which is an alphabetic block by the rectangle frame by 
which eye the 2nd train of a table and eye the 3rd train consist of a ruled line as shown in (b) of drawing 
24 . Moreover, as shown in the table shown in (a) of drawing 25 , when a part of all and horizontal ruled 
line are omitted for the vertical ruled line, as shown in this drawing (b), all configuration frames serve as 
an alphabetic block. Moreover, as shown in the table shown in (a) of drawing 26 , when a part of vertical 
ruled line of a table and horizontal ruled line are omitted, the rectangle frame which consists of a ruled 
line into a table can be extracted, but since two or more alphabetic blocks are fiirther included in the 
interior, as shown in (b) of drawing 26 , the whole of the configuration frame becomes an alphabetic 
block. As shown in drawing 24 , drawing 25 , and drawing 26 , this example can take out the structure of 
the row and column of a table to accuracy also to which type of table. Although this example explained 
for the table with which the ruled line is v^itten at least, a ruled line can be applied also to the table 
which is not included at all by performing same processing. Furthermore, it is also possible to add the 
structure as a table to the text which is not clearly written as a table, for example, the document of an 



http://www4.ipdl.ncipi.go.jp/cgi-bm/tran__web_cgi_ejje 6/16/05 



JP,05-334490,A [DETAILED DESCRIPTION] 



Page 13 of 13 



itemized statement. 
[0046] 

[Effect of the Invention] According to this invention, the physical relationship between alphabetic 
blocks extracted by the alphabetic block extract means is identified with a physical relationship 
discernment means. Since the alphabetic block in a table is in the physical relationship which generally 
aligned regularly as a component of a table, the structure of a table can be recognized by seeing the 
physical relationship between alphabetic blocks. Since it was asking for the frame which constitutes a 
table only paying attention to the ruled line of a table conventionally, there was a problem that structure 
of a table which has an abbreviation in a part or all of a vertical ruled line and a horizontal ruled line 
could not be recognized to accuracy, but since the structure of a table is recognized using the list of an 
alphabetic block according to this invention, the problem is solvable. 

[0047] Moreover, according to the mode which established the alphabetic character field extract means 
of this invention, an alphabetic character field extract means extracts an alphabetic character field using 
the information on a ruled line, and an alphabetic block is extracted for every alphabetic character field. 
Therefore, there is no possibility that the alphabetic character rectangle which approached on both sides 
of the ruled line may be extracted as one block, and it can extract an alphabetic block with a sufficient 
precision, as a result can recognize the structure of a table to accuracy. 

[0048] A rectangle frame extract means to extract the rectangle frame surrounded by the ruled line in 
this invention is established. In a thing [ like ] the voice which treats equally the alphabetic block in the 
rectangle frame surrounded by the ruled line in the physical relationship discernment means, and a table, 
and identified each physical relationship ~ Since it is identified that it is one component in a table as an 
alphabetic block even if it is a frame in the table which is not surrounded by the ruled line by treating 
equally the alphabetic block in the rectangle frame surrounded by the ruled line of a table, and a table A 
ruled line can recognize the table with which a part or all of a ruled line was omitted as well as the table 
which has all gathered to accuracy. 
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* NOTICES * 

JPO and NCIPI are not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original 
precisely, 

2.**** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing 1] Drawing showing the configuration of the 1st example of this invention 
[Drawing 2] (a) - (e) is drawing showing the example of the table used in a document. 
[Drawing 3] (a) It is drawing in which was reached and (b) showed the example of an alphabetic 
character rectangle. 

[Drawing 4] (a) - (e) is drawing for explaining recognition of the table with which all ruled lines were 
omitted. 

[Drawing 5 a] Drawing showing the flow of extract processing of an alphabetic block 

[Drawing 5 b] Drawing showing the flow of extract processing of an alphabetic block (continuation of 

drawing 5 a) 

[Drawing 6 a] Drawing showing the flow of line sampling processing 

[Drawing 6 b] Drawing showing the flow of line sampling processing (continuation of drawing 6 a) 
[Drawing 7 a] Drawing showing the flow of train extract processing 

[Drawing 7 b] Drawing showing the flow of train extract processing (continuation of drawing 7 a) 
[Drawing 8] Drawing showing the configuration of the 2nd example of this invention 
[Drawing 9] 

[Drawing 10 a] Drawing showing the flow of the alphabetic character field extract processing in the 2nd 
example 

[Drawing 10 b] Drawing showing the flow of the alphabetic character field extract processing in the 2nd 
example (continuation of drawing 10 a) 

[Drawing 11] Drawing showing the configuration of the 3rd example of this invention 

[Drawing 12 a] Drawing showing the processing flow of the perfect rectangle frame extract processing 

section 

[Drawing 12 b] Drawing showing the processing flow of the perfect rectangle fi-ame extract processing 
section (continuation of drawing 12 a) 

[Drawing 13] Drawing showing the example of the vector data which constitutes a table 

[Drawing 14] Drawing showing drawing showing an example of a rectangle frame configuration table 

[Drawing 15] Drawmg showing other examples of the vector data which constitutes a table 

[Drawing 16] Drawing showing other examples of a rectangle frame configuration table 

[Drawing 17] (a) Drawing showing an example of a rectangle frame table and (b) alphabetic character 

field table 

[Drawing 18 a] Drawing showing the flow of imperfection rectangle frame extract processing 
[Drawing 18 b] Drawing showing the flow of imperfection rectangle frame extract processing 
(continuation of drawing 1 8 a) 

[Drawing 19] Drawing showing the example of the vector data which constitutes the table with which 
some ruled lines were omitted 

[Drawing 20] It is drawing in which the example of the rectangle frame configuration table 
corresponding to the table of drawing 19 is shown, (a) shows a perfect rectangle frame configuration 
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table, and (b) shows an imperfection rectangle frame configuration table, respectively. 
[Drawing 21] Drawing for explaining the extract of an alphabetic character field 
[Drawing 22] Drawing showing the flow of processing of the configuration frame discernment 
processing section 

[Drawing 23] As for (a), drawing having shown the example of an extract of a configuration frame, a 
line, and a train shows the example of the table v^th which the right end vertical ruled line was omitted, 
the configuration frame with which (b) was extracted, the line from which (c) was extracted, and the 
train from which (d) was extracted, respectively. 

[Drawing 24] As for (a), drawing having shown other examples of an extract of a configuration frame, a 
line, and a train shows the example of the table with which the vertical ruled line of right-and-left ends 
was omitted, the configuration frame with which (b) was extracted, the line from which (c) was 
extracted, and the train from which (d) was extracted, respectively. 

[Drawing 25] As for (a), drawing having shown other examples of an extract of a configuration frame, a 
line, and a train shows the example of the table with which all vertical ruled lines were omitted, the 
configuration frame with which (b) was extracted, the line from which (c) was extracted, and the 
example from which (d) was extracted, respectively. 

[Drawing 26] As for (a), drawing having shown other examples of an extract of a configuration frame, a 
line, and a train shows the example of the table with which a part of vertical ruled line and horizontal 
ruled line were omitted, the configuration frame with which (b) was extracted, the line from which (c) 
was extracted, and the train from which (d) was extracted, respectively. 
[Description of Notations] 

1181" The alphabetic block extract section, 1 11,813 - Alphabetic character rectangle extract 
processing section, 112,8 14 ~ 12 The alphabetic block rectangle extract processing section, 82 - 
Physical relationship discernment section, 121,821 ~ The line sampling processing section, 122,822 - 
The train extract processing section, 123,823 ~ Table structure storage section, 81 1 - The ruled line 
vectorization processing section, 812 - The alphabetic character field extract processing section, 1110- 
An alphabetic character and the ruled line separation processing section, 1 120 - The rectangle frame 
extract section, 1121 ~ The ruled line vectorization processing section, 1 122 ~ Perfect rectangle frame 
extract processing section, 1 123 ~ The imperfection rectangle extract processing section, 1130 - 
Alphabetic block extract section, 1 132 - The alphabetic character rectangle extract processing section, 
1 133 - Alphabetic block rectangle extract processing section, 1 140 - The physical relationship 
discernment section, 1141-- The configuration frame discernment processing section, 1 142 - Line 
sampling processing section, 1 143 [ - An alphabetic character rectangle, 161 / ~ A configuration frame, 
171 / - A rectangle frame table, 172 / - Alphabetic character field table 211/-- Alphabetic character 
field. ] - The train extract processing section, 1 144 - The table structure storage section, 31, 32, 33, 34 
~ A black pixel lump, 35, 36, 39 
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* NOTICES * 

JPO and NCI PI are not responsible for any 
damages caused by the use o£ this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2.**** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 
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